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Abstract:Myopathy and Neuropathy are non- 
progressive and progressive neuromuscular disorders 
which weakens the muscles and nerves respectively. 
Electromyography (EMG) signals are bio signals 
obtained from the individual muscle cells. EMG based 
diagnosis for neuromuscular disorders is a safe and 
reliable method. Integrating the EMG signals with 
machine learning techniques improves the diagnostic 
accuracy. The proposed system performs analysis on 
the clinical raw EMG dataset which is obtained from 
the publicly available PhysioNet database. The two- 
channel raw EMG dataset of healthy, myopathy and 
neuropathy subjects are divided into samples. The Time 
Domain (TD) features are extracted from divided 
samples of each subject. The extracted features are 
annotated with the class label representing the state of 
the individual. The annotated features split into 
training and testing set in the standard ratio 70: 30. The 
comparative classification analysis on the complete 
annotated features set and prominent features set 
procured using Pearson correlation technique is 
performed. The features are scaled using standard 
scaler technique. The analysis on scaled annotated 
features set and scaled prominent features set is also 
implemented. The hyperparameter space of the 
classifiers are given by trial and error method. The 
hyperparameters of the classifiers are tuned using 
Bayesian optimization technique and the optimal 
parameters are obtained. and are fed to the tuned 
classifier. The classification algorithms considered in 
the analysis are Random Forest and Multi-Layer 
Perceptron Neural Network (MLPNN). The 
performance evaluation of the classifiers on the test 
data is computed using the Accuracy, Confusion 
Matrix, F1 Score, Precision and Recall metrics. The 
evaluation results of the classifiers states that Random 
Forest performs better than MLPNN wherein it 
provides an accuracy of 96 % with non-scaled Time 
Domain (TD) features and MLPNN outperforms better 
than Random Forest with an accuracy of 97% on 
scaled Time Domain (TD) features which is higher 
than the existing systems. The inferences from the 
evaluation results is that 
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Bayesian optimization tuned classifiers improves the 
accuracy which provides a robust diagnostic model for 
neuromuscular disorder diagnosis. 


Keywords—EMG, Bayesian optimization, MLPNN, 
Random Forest 


IINTRODUCTION 


Electromyography (EMG) signals are 
generated from the muscle cells and are measured 
in electric potentials [1][13]. EMG based diagnosis 
for neuromuscular disorders is reliable since it 
denotes the difference of electric potentials of the 
electrodes which is generated by the subject muscle 
cells while performing movements  [1][2]. 
Myopathy is a non- progressive neuromuscular 
disorder which affects the muscle cells. Neuropathy 
is a progressive neuromuscular disorder which 
affects the nerve cells [3]. Integration of Machine 
Learning algorithms with EMG signals improves 
the speed and accuracy of diagnosis. 


The drawback identified in the existing 
systems is that lack of analysis in tuning the 
hyperparameters of the classifiers since it is the 
most essential step in improving the accuracy of the 
model. Another drawback identified is that lack of 
analysis on impact of features based on feature 
selection techniques. 


The paper is organized as follows: section 
II discusses about the related works on 
neuromuscular disorder diagnosis methods. Section 
III presents the methodologies of the proposed 
framework with the block diagram covering all 
processes. Section IV provides the results and its 
inferences. Section V covers the conclusion and 
future work. 


II RELATED WORKS 
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Kehri et al [1], presents an EMG signal Decomposition (EMD) wherein the features 
analysis for the diagnosis of myopathy and namely mean, standard deviation, variance and 
neuropathy. The analysis focuses on decomposition entropy were obtained. Swaroop. R et al in this 
of signals using wavelet transform (WT). The paper [5] has presented a  back-propagation 
statistical features are extracted from the algorithm which classify the healthy, myopathy and 
decomposed signals. The classification algorithms neuropathy EMG signals. I. Elamvazhuthi et al [15] 
namely Artificial Neural Network (ANN) and Non- presents a classification of neuromuscular disorders 
Linear Support Vector Machine (SVM) are used using Artificial Neural Network (ANN) based on 
for discriminating myopathy and neuropathy the features known as Auto Regressive (AR), Root 
subjects from healthy subjects. V. Kehri et al [2], Mean Square (RMS), Zero Crossing (ZC), Mean 
provides a review on different feature extraction Absolute Value (MAV) and Waveform Length 
techniques and classification techniques. The (WL). 


analysis focus on classification of neuromuscular 
disease based on EMG signals on various 
combinations of feature extracted techniques. Amit 
kumar singh et al [3], discriminates the 
neuromuscular disorder EMG signals by 
decomposing the signals using Empirical Mode 


III METHODOLOGIES 


Figure 1 depicts the block diagram of the 
Proposed Framework wherein it presents the 
techniques used in each process. 
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Figure 1 Block Diagram of the Proposed Framework 


A. Dataset Description (MUP) were obtained. EMG data from 
The EMG data has been retrieved subjects is acquired using using Medelec 
from publicly available — PhysioNet synergy N2 EMG monitoring system.[12] 
database. EMG signals has been recorded B. Signal Processing-Dividing the EMG 

by inserting a needle electrode into signals into samples 
Tibialis anterior muscle and the subjects The EMG signals of healthy, 
has been asked to dorsiflexed the foot. The myopathy and neuropathy subjects are 
repositioning of electrodes has been divided into samples with 100 instances 
performed until the Motor Unit Potentials per sample for healthy subject and 200 
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instances per sample for myopathy and 

neuropathy subjects. The division of 

samples provides about 509 samples for 
healthy subjects, 552 samples for 
myopathy subjects, 740 samples for 

neuropathy subjects. The total of 1801 

instances of all three subjects is obtained 

as a result. 

Features Extraction 

The following 15 Time Domain 

(TD) based features are extracted from the 

divided samples.[11] 

1. Enhanced Mean Absolute Value 
(EMAV): It is an enhancement of 
Mean Absolute Value (MAV) with 
computation including a threshold 
value p. 

2. Enhanced Wavelength (EW): It is an 
enhancement of Wavelength(W) with 
computation including a threshold 
value p. 

3. Mean Absolute Value: This parameter 
represents the amplitude of the signal. 
It provides the addition of the 
absolute value of EMG signal. 

4. Wavelength: This parameter provides 
the cumulative length of the signal 
covering all neighboring peaks. 

5. Root Mean Square: This parameter 
directly implies the amplitude of the 
signal. It is computed by obtaining the 
square root of the average of the 
EMG signal. 

6. Average Amplitude change: It 
represents the average of the 
amplitude of the EMG signal. 

7. Difference Absolute Standard 
Deviation value: It denotes the 
standard deviation of the adjacent 
samples. 

8. Log detector: It is associated with the 
exerted force of the movement. 

9. Modified Mean Absolute Value: It is 
an extension of MAV with assigned 
weights. 

10. Modified Mean Absolute Value 2: It 
is an extension of MAV with 
continuous weights assigned. 

11. Myopulse Percentage Rate: It is a 
mean of the output wherein the output 
exceeds the absolute value exceed the 
predefined threshold value. 
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12. Simple Square Integral: It is the sum 
of the squared values of the EMG 
signal. 

13. Variance of EMG: It is the average of 
the power of the signal. 

14. Willison Amplitude: Changes in 
EMG signal when it exceeds the 
threshold values is specified using 
this parameter. 

15. Maximum Fractal Length: The 
strength of the contraction is specified 
by this parameter. 

Annotating Features 

The extracted features are 

annotated with class labels wherein class 

label O represents healthy subjects, 1 

represents myopathy subjects and 2 

represents neuropathy subjects. 

Feature Selection 

i. Working Principle of Pearson 
Correlation Coefficient (PCC) 

PCC is used to find linear 
dependency between variables using 
the covariance between the features 
and class label and dividing the 
covariance value with square root of 
the product of features and class label 
[8]. 

ii.Output of Pearson Correlation 

coefficient (PCC) 

The correlation between features 
and with the class label is provided in 
the range from -1 to 1 where value 

> nearer to zero refers to 

weaker correlation 

> closer to 1 denotes to 

stronger positive correlation 

» closer to -1 represents 

stronger negative correlation 
The nine prominent features namely 
EW, MAVI, RMSI, W, RMS2, LD, 
MMAV, MMAV?2, SSII are obtained 
from this Feature selection process. 


iii Uncorrelated Features 


From this nine prominent 
Features, uncorrelated features are 
procured. The uncorrelated features 
such as Enhanced Wavelength and 
Root Mean Square are procured. 


F. Splitting of Training and Testing Set 
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The annotated features are split 
into training and testing set with 70 % data 
contributing to training set and remaining 
30 % data contributing to testing set. 
Hyperparameters Tuning 

The hyperparameters of the classifiers 
are tuned using Bayesian optimization 


technique. 
i. Working Principle of Bayesian 
optimization 


The Bayesian optimization setup 
consists of an objective function, 
parameter space on which the search 
algorithm is applied and trials 
database. The search algorithm used 
is tree of Parzen algorithm.[9] 

ii. Input Hyperparameter Space of 
MLPNN 
The hidden_layer_sizes and max_iter 
parameters values are specified using 
trial and error method. 
> activation: ['relu', ‘logistic’, 'tanh’, 

‘identity’] 
> solver: ['Ibfgs', 'adam'] 
> hidden_layer_sizes:[(1,),(2,),(3,), 

(4,),(5,),(6,),(7,).(8,),(9,),(10,),(11 

»)(12,),(13,),(14,),(15,),(16,),(17,) 

(18,),(19,),(20,),(21,)] 


> learning rate: ['constant', 
'invscaling', 'adaptive'] 

> ‘max iter’: [1000,2000, 

3000,4000,5000,6000. ..,20000] 

ii. Input Hyperparameter Space of 


Random Forest 

Here max depth and n estimators 
parameter values are specified using 
trial and error method. 

> Criterion: ['entropy’, 'gini'] 

> max depth: [10, 100, 10] 

> max features: ['auto’, 'sqrt','log2', 


None] 
» n estimators: [10, 60,600,1500] 
Classification 
Random Forest and MLPNN are 


considered for classification since 
Random Forest is an ensemble technique 
which combines the prediction of several 
decision trees which would be better than 
the prediction provided by single classifier 
and MLPNN integrates predictions of 
several hidden layers. 

i. Working Principle of Multi-Layer 

Perceptron Neural Network 
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j. 


The Multi- Layer Perceptron is 
the commonly used form of 
ANN[7]. It consists of input layers, 
hidden layers and output layers. 
Each layer consists of neurons. The 
neurons in input layer distributes the 
input and the neurons in hidden 
layer performs weighted 
computations and produce output to 
the neurons in the output layer.[14] 

ii. Working Principle of Random 
Forest 
Training Phase: It trains several 
decision trees with the training data 
in parallel.[10] 


Testing Phase: It combines the 
predictions of several decision tree 
on testing data using majority voting 
technique. [10] 


Performance Evaluation 

i. Confusion Matrix: provides the 
matrix of True Positive (TP), True 
Negative (TN), False Positive (FP) 
and False Negative (FN). 

ii. Precision: It is the ratio of True 
Positives (TP) and the total number 
of True Positive (TP) and False 
Positives (FP). 

ii. Recall: It is the ratio of True 
Positives (TP) and the total number 
of True Positive (TP) and False 
Negatives (FN). 

iv. F1 score denotes the weighted mean 
of Precision and Recall. 


IV RESULTS ANALYSIS 


The proposed work analysis is performed in 
three experiments. The following are the conducted 
experiments: 


1. 


Experiment 1: The experiment 1 performs 
classification analysis of Bayesian opt 
tuned Random forest and MLPNN with 
the Time Domain Feature set which 
consists of 19 features. 

Experiment 2: The experiment 2 performs 
features selection analysis on the extracted 
Time Domain Features using Pearson 
Correlation technique and the prominent 
features are fed to the Bayesian opt tuned 
Random forest and MLPNN. 
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3. Experiment 3: The experiment 3 focuses 
on the classification with uncorrelated 
features which are obtained by eliminating 
the most correlated features from the 
prominent features procured using 
experiment 2. 

Figure 2 presents the highest accurate 

confusion matrix of Scaled Multi-Layer Perceptron 


and Non-Scaled Random Forest procured in 
experiment 1; where x axis represents the predicted 
label count and y axis represents the true label. The 
number of correctly predicted data points will be 
present in the diagonal axis. The class label 0 
represents healthy subject ,1 represents myopathy 
subject and 2 represents neuropathy subjects. 
Figure 3 depicts the accuracy graph. 


true label 


true label 


Pm 
200 


0 1 2 
predicted label 


0 1 2 
predicted label 


Figure 2 Best Accurate Confusion Matrix of Scaled MLPNN(Left) And Non-Scaled Random Forest (Right) 
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Figure 3 Accuracy Graph of Three Experiments. 
TABLE I CLASSIFIER PERFORMANCE EVALUATION 
EXP.NO EXPERIMENT CLASSIFIER PRECISION RECALL F1 SCORE SUPPORT 
NO SCALING OF FEATURES 
1 TIME DOMAIN FEATURES BAYESIAN OPT 0.96 0.96 0.96 540 
ANALYSIS TUNED RANDOM 
FOREST 
BAYESIAN OPT 0.86 0.86 0.86 540 
TUNED MLPNN 
2 PEARSON BAYESIAN OPT 0.88 0.89 0.88 540 
CORRELATION(PC) TUNED RANDOM 
FEATURES ANALYSIS FOREST 
BAYESIAN OPT 0.67 0.67 0.64 540 
TUNED MLPNN 
3 UNCORRELATED BAYESIAN OPT 0.87 0.87 0.86 540 
FEATURES ANALYSIS OF TUNED RANDOM 
PEARSON CORRELATION FOREST 
FEATURES BAYESIAN OPT 0.85 0.84 0.83 540 
TUNED MLPNN 
SCALING OF FEATURES 
1 TIME DOMAIN FEATURES BAYESIAN OPT 0.96 0.96 0.96 540 
ANALYSIS TUNED RANDOM 
FOREST 
BAYESIAN OPT 0.97 0.97 0.97 540 
TUNED MLPNN 
2 PEARSON BAYESIAN OPT 0.88 0.89 0.88 540 
CORRELATION(PC) TUNED RANDOM 
FEATURES ANALYSIS FOREST 
BAYESIAN OPT 0.95 0.95 0.95 540 
TUNED MLPNN 
3 UNCORRELATED BAYESIAN OPT 0.87 0.87 0.86 540 
FEATURES ANALYSIS OF TUNED RANDOM 
PEARSON CORRELATION FOREST 
FEATURES BAYESIAN OPT 0.84 0.84 0.83 540 
TUNED MLPNN 
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a. Inferences from the Experimental results: 

1. Random Forest performs better 
without scaling the features than 
Multi- Layer Perceptron since the 
Random forest trains several decision 
trees and combines the predictions of 
several decision trees, it is more 
accurate and robust after tuning the 
parameters. 

2. Random Forest provides same 
accuracy with and without scaling. 
MLPNN outperforms random forest 
after scaling with an accuracy of 9796 
for time domain-based features and 
with an accuracy of 9596 with Pearson 
Correlation Features. 

3. Time Domain Features set provided a 
highest accuracy result of 97% for 
MLPNN and 96% for Bayesian 
optimization tuned random forestthan 


Pearson correlation features and 
uncorrelated features. 
4. MLPNN classifier provided an 


accuracy of 97% and it outperforms 
than random forest after scaling the 
Time Domain’ Features using 
Standard Scaler. 

5. Random Forest provided good 
accuracy of 9696 without any feature 
selection techniques since the feature 
importance attribute which is present 
in tree-based classifier is more 
accurate than the explicit feature 
selection techniques. 


V CONCLUSION AND FUTURE WORK 


Myopathy and neuropathy are neuromuscular 
disorders which affect the cells of the muscle and 
the nerve. EMG diagnosis is essential since it 
measures the muscle activity and processing of 
EMG requires an appropriate signal processing 
technique. Machine Learning Techniques improves 
the diagnostic accuracy by training the models with 
the processed EMG signals. The existing systems 
lacks focuses on tuning of the classifiers 
hyperparameters which is the most important 
procedure for gaining good accurate and robust 
model. The proposed framework mainly focuses on 
tuning the Random Forest and Multi-Layer 
Perceptron Neural Network (MLPNN) classifier 
using Bayesian Optimization technique which 
makes informed decisions at each iteration in 
forming the combination of parameters and 
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provides the optimal parameters for the classifiers. 
Another problem identified in existing system is 
that lack of impact analysis of features using 
feature selection techniques. The above specified 
problem is covered in the proposed framework by 
conducting three experiments which is based on 
Time Domain Feature Analysis, Pearson 


Correlation Technique Feature Analysis and 
Uncorrelated Feature Analysis. The Bayesian 
optimization tuned MLPNN provides highest 


accuracy of 97% and Random Forest of 96% with 
Time Domain Features. Future Enhancement to this 
work is to build a robust diagnostic Machine 
Learning (ML) Model by tuning the parameters of 
different classifiers and combine the classifier’s 
predictions using Ensemble techniques and also to 
performs analysis with different feature selection 
techniques. 
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